The Multivariate Normal Distribution and Related Topics Recall that if X X X is a vector of continuous random variables with a joint probability density function and if Y = h ( X ) Y=h(X) Y = h ( X ) such that h h h is a one-to-one function and continuously differentiable with inverse g g g so X = g ( Y ) X= g(Y) X = g ( Y ) , then the density of Y Y Y is given by
f Y ( y ) = f ( g ( y ) ) ∣ J ∣ f_Y(y)=f(g(y))|J| f Y ( y ) = f ( g ( y )) ∣ J ∣
Details J J J is the Jacobian determinant of g g g .
In particular if Y = A X Y=AX Y = A X then
f Y ( y ) = f ( A − 1 y ) ∣ d e t ( A − 1 ) ∣ f_Y(y)=f(A^{-1}y)|det(A^{-1})| f Y ( y ) = f ( A − 1 y ) ∣ d e t ( A − 1 ) ∣
if A A A has an inverse.
The Multivariate Normal Distribution Details Consider independent identically distributed random variables, Z 1 , … , Z n ∼ N ( 0 , 1 ) Z_1, \ldots,Z_n \sim N(0,1) Z 1 , … , Z n ∼ N ( 0 , 1 ) ,
Z ‾ = ( Z 1 ⋮ Z n ) \underline{Z} = \left( \begin{array}{ccc} Z_1 \\ \vdots \\ Z_n \end{array} \right) Z = ⎝ ⎛ Z 1 ⋮ Z n ⎠ ⎞ and let Y ‾ = A Z ‾ + μ ‾ \underline{Y}=A \underline{Z} + \underline{\mu} Y = A Z + μ where A A A is an invertible n × n n \times n n × n matrix and μ ‾ ∈ R n \underline{\mu} \in \mathbb{R}^n μ ∈ R n is a vector, so Z = A − 1 ( Y − μ ‾ ) Z= A^{-1}(Y-\underline{\mu}) Z = A − 1 ( Y − μ ) .
Then the p.d.f.
of Y Y Y is given by
f Y ‾ ( y ‾ ) = f Z ‾ ( A − 1 ( y ‾ − μ ‾ ) ) ∣ det ( A − 1 ) ∣ f_{\underline{Y}}(\underline{y})= f_{\underline{Z}}(A^{-1}(\underline{y}- \underline{\mu})) \vert \det(A^{-1}) \vert f Y ( y ) = f Z ( A − 1 ( y − μ )) ∣ det ( A − 1 ) ∣
But the joint p.d.f.
of Z ‾ \underline{Z} Z is the product of the p.d.f.
's of Z 1 , … , Z n Z_1, \ldots, Z_n Z 1 , … , Z n , so f Z ‾ ( z ‾ ) = f ( z 1 ) ⋅ f ( z 2 ) ⋅ … ⋅ f ( z n ) f_{\underline{Z}}(\underline{z})= f(z_1) \cdot f(z_2) \cdot \ldots \cdot f(z_n) f Z ( z ) = f ( z 1 ) ⋅ f ( z 2 ) ⋅ … ⋅ f ( z n ) where
f ( z i ) = 1 2 π e − z 2 2 f(z_i) = \displaystyle\frac{1}{\sqrt{2 \pi}} e^{-\displaystyle\frac{z^2}{2}} f ( z i ) = 2 π 1 e − 2 z 2
and hence
f Z ‾ ( z ‾ ) = ∏ i = 1 n 1 2 π e − z 2 2 = ( 1 2 π ) n e − 1 2 Σ i = 1 n z i 2 = 1 ( 2 π ) n 2 e − 1 2 z ‾ ′ z ‾ \begin{align} f_{\underline{Z}}(\underline{z}) &= \prod_{i=1}^n \displaystyle\frac{1}{\sqrt{2 \pi}} e^{\displaystyle\frac{-z^2}{2}} \\ &= \left( \displaystyle\frac{1}{\sqrt{2 \pi}} \right) ^n e^{-\displaystyle\frac{1}{2} \Sigma_{i=1}^n z_i^2} \\ &= \displaystyle\frac{1}{(2 \pi) ^ {\displaystyle\frac{n}{2}}} e^{-\displaystyle\frac{1}{2} \underline{z}'\underline{z}} \end{align} f Z ( z ) = i = 1 ∏ n 2 π 1 e 2 − z 2 = ( 2 π 1 ) n e − 2 1 Σ i = 1 n z i 2 = ( 2 π ) 2 n 1 e − 2 1 z ′ z since
∑ i = 1 n z i 2 = ∥ z ‾ ∥ 2 = z ‾ ⋅ z ‾ = z ‾ ′ z ‾ \displaystyle\sum_{i=1}^n z_i^2 = \Vert \underline{z} \Vert ^2 = \underline{z} \cdot \underline{z} = \underline{z}' \underline{z} i = 1 ∑ n z i 2 = ∥ z ∥ 2 = z ⋅ z = z ′ z
The joint p.d.f.
of Y ‾ \underline{Y} Y is therefore
f Y ‾ ( y ‾ ) = f Z ‾ ( A − 1 ( y ‾ − μ ‾ ) ) ∣ det ( A − 1 ) ∣ = 1 ( 2 π ) n 2 e − 1 2 ( A − 1 ( y ‾ − μ ‾ ) ) ′ ( A − 1 ( y ‾ − μ ‾ ) ) 1 ∣ det ( A ) ∣ \begin{align} f_{\underline{Y}}(\underline{y}) &= f_{\underline{Z}}(A^{-1}(\underline{y} - \underline{\mu})) \vert \det(A^{-1}) \vert \\ &= \displaystyle\frac{1}{(2 \pi)^{\displaystyle\frac{n}{2}}} e^{-\displaystyle\frac{1}{2}(A^{-1}(\underline{y}-\underline{\mu}))'(A^{-1}(\underline{y}-\underline{\mu}))}\displaystyle\frac{1}{\vert \det(A)\vert} \end{align} f Y ( y ) = f Z ( A − 1 ( y − μ )) ∣ det ( A − 1 ) ∣ = ( 2 π ) 2 n 1 e − 2 1 ( A − 1 ( y − μ ) ) ′ ( A − 1 ( y − μ )) ∣ det ( A ) ∣ 1 We can write det ( A A ′ ) = d e t ( A ) 2 \det(AA')=det(A)^2 det ( A A ′ ) = d e t ( A ) 2 so ∣ det ( A ) ∣ = d e t ( A A ′ ) \vert \det(A)\vert = \sqrt{det(AA')} ∣ det ( A ) ∣ = d e t ( A A ′ ) and if we write Σ = A A ′ \Sigma=AA' Σ = A A ′ , then
∣ det ( A ) ∣ = ∣ Σ ∣ 1 2 \vert \det(A) \vert = \vert \boldsymbol{\Sigma} \vert ^ {\displaystyle\frac{1}{2}} ∣ det ( A ) ∣ = ∣ Σ ∣ 2 1
Also, note that
( A − 1 ( y ‾ − μ ‾ ) ) ′ ( A − 1 ( y ‾ − μ ‾ ) ) = ( y ‾ − μ ‾ ) ′ ( A − 1 ) ′ A − 1 ( y ‾ − μ ‾ ) = ( y ‾ − μ ‾ ) ′ Σ − 1 ( y ‾ − μ ‾ ) (A^{-1}(\underline{y}-\underline{\mu}))'(A^{-1}(\underline{y}-\underline{\mu})) = (\underline{y} - \underline{\mu})'(A^{-1})' A^{-1}(\underline{y} - \underline{\mu}) = (\underline{y} - \underline{\mu})' \boldsymbol{\Sigma}^{-1}(\underline{y}-\underline{\mu}) ( A − 1 ( y − μ ) ) ′ ( A − 1 ( y − μ )) = ( y − μ ) ′ ( A − 1 ) ′ A − 1 ( y − μ ) = ( y − μ ) ′ Σ − 1 ( y − μ )
We can now write
f Y ‾ ( y ‾ ) = 1 ( 2 π ) n 2 ∣ Σ ∣ 1 2 e − 1 2 ( y ‾ − μ ‾ ) Σ − 1 ( y ‾ − μ ‾ ) f_{\underline{Y}}(\underline{y}) = \displaystyle\frac{1}{(2 \pi)^{\displaystyle\frac{n}{2}} \vert \boldsymbol{\Sigma} \vert ^{\displaystyle\frac{1}{2}}} e^{-\displaystyle\frac{1}{2} (\underline{y}-\underline{\mu}) \boldsymbol{\Sigma}^{-1} (\underline{y}-\underline{\mu})} f Y ( y ) = ( 2 π ) 2 n ∣ Σ ∣ 2 1 1 e − 2 1 ( y − μ ) Σ − 1 ( y − μ )
This is the density of the multivariate normal distribution.
Note that
E [ Y ‾ ] = μ E[\underline{Y}] = \mu E [ Y ] = μ
V a r [ Y ‾ ] = V a r [ A Z ‾ ] = A V a r [ Z ‾ ] A ′ = A I A ′ = Σ Var[\underline{Y}] = Var[A\underline{Z}] = AVar[\underline{Z}]A' = AIA' = \boldsymbol{\Sigma} Va r [ Y ] = Va r [ A Z ] = A Va r [ Z ] A ′ = A I A ′ = Σ
Notation: Y ‾ ∼ N ( μ ‾ , Σ ) \underline{Y}\sim N(\underline{\mu}, \boldsymbol{\Sigma}) Y ∼ N ( μ , Σ )
The general univariate normal distribution with density
f Y ( y ) = 1 2 π σ e − ( y − μ ) 2 2 σ 2 f_Y(y) = \displaystyle\frac{1}{\sqrt{2\pi}\sigma}e^{-\displaystyle\frac{(y-\mu)^2}{2\sigma^2}} f Y ( y ) = 2 π σ 1 e − 2 σ 2 ( y − μ ) 2
is a special case of the multivariate version.
Details Further, if Z ∼ N ( 0 , 1 ) Z\sim N(0,1) Z ∼ N ( 0 , 1 ) , then clearly X = a Z + μ ∼ N ( μ , σ 2 ) X=aZ+\mu \sim N(\mu,\sigma^2) X = a Z + μ ∼ N ( μ , σ 2 ) , where σ 2 = a 2 \sigma^2=a^2 σ 2 = a 2 .
If Y ∼ N ( μ , Σ ) Y\sim N \left ( \boldsymbol{\mu},\boldsymbol{\Sigma} \right ) Y ∼ N ( μ , Σ ) is a random vector of length n n n and A A A is an m × n m\times n m × n matrix of rank m ≤ n m\leq n m ≤ n , then A Y ∼ N ( A μ , A Σ A ′ ) AY \sim N(A\mu,A\Sigma A') A Y ∼ N ( A μ , A Σ A ′ ) .
Details If Y ∼ N ( μ , Σ ) Y\sim N \left ( \boldsymbol{\mu},\boldsymbol{\Sigma} \right ) Y ∼ N ( μ , Σ ) is a random vector of length n n n and A A A is an m × n m\times n m × n matrix of rank m ≤ n m\leq n m ≤ n , then A Y ∼ N ( A μ , A Σ A ′ ) AY \sim N(A\mu,A\Sigma A') A Y ∼ N ( A μ , A Σ A ′ ) .
To prove this, set up an ( n − m ) × n (n-m)\times n ( n − m ) × n matrix, B B B , so that the n × n n\times n n × n matrix, C C C , formed from combining the rows of A A A and B B B is of full rank n n n .
Then it is easy to derive the density of C Y CY C Y which also factors nicely into a product, only one of which contains A Y AY A Y , which gives the density for A Y AY A Y .
The OLS Estimator Suppose Y ∼ N ( X β , σ 2 I ) Y \sim N(X \beta,\sigma^2 I) Y ∼ N ( Xβ , σ 2 I ) .
The ordinary least squares estimator, when the n × p n \times p n × p matrix is of full rank, p p p , where p ≤ n p\leq n p ≤ n , is:
β ^ = ( X ′ X ) − 1 X ′ Y \hat{\beta} = (X'X)^{-1}X'Y β ^ = ( X ′ X ) − 1 X ′ Y
The random variable which describes the process giving the data and estimate is:
b = ( X ′ X ) − 1 X ′ Y b = (X'X)^{-1}X'Y b = ( X ′ X ) − 1 X ′ Y
It follows that
β ^ ∼ N ( β , σ 2 ( X ′ X ) − 1 ) \hat{\beta} \sim N(\beta,\sigma^{2}(X'X)^{-1}) β ^ ∼ N ( β , σ 2 ( X ′ X ) − 1 )
Details Suppose Y ∼ N ( X β , σ 2 I ) Y \sim N(X \beta,\sigma^2I) Y ∼ N ( Xβ , σ 2 I ) .
The ordinary least squares estimator, when the n × p n \times p n × p matrix is of full rank, p p p , is:
β ^ = ( X ′ X ) − 1 X ′ Y \hat{\beta} = (X'X)^{-1}X'Y β ^ = ( X ′ X ) − 1 X ′ Y
The equation below is the random variable which describes the process giving the data and estimate:
b = ( X ′ X ) − 1 X ′ Y b = (X'X)^{-1}X'Y b = ( X ′ X ) − 1 X ′ Y
If B = ( X ′ X ) − 1 X ′ B = (X'X)^{-1}X' B = ( X ′ X ) − 1 X ′ , then we know that
B Y ∼ N ( B X β , B ( σ 2 I ) B ′ ) BY \sim N(B X \beta, B(\sigma^{2}I)B') B Y ∼ N ( BXβ , B ( σ 2 I ) B ′ )
Note that
B X β = ( X ′ X ) − 1 X ′ X β = β BX\beta = (X'X)^{-1}X'X\beta=\beta BXβ = ( X ′ X ) − 1 X ′ Xβ = β
and
B ( σ 2 I ) B ′ = σ ( X ′ X ) − 1 X ′ [ ( X ′ X ) − 1 X ′ ] ′ = σ 2 ( X ′ X ) − 1 X ′ X ( X ′ X ) − 1 = σ 2 ( X ′ X ) − 1 \begin{aligned} B(\sigma^{2}I)B' &= \sigma^{}(X'X)^{-1}X'[(X'X)^{-1}X']' \\ &= \sigma^{2}(X'X)^{-1}X'X(X'X)^{-1} \\ &= \sigma^{2}(X'X)^{-1} \end{aligned} B ( σ 2 I ) B ′ = σ ( X ′ X ) − 1 X ′ [( X ′ X ) − 1 X ′ ] ′ = σ 2 ( X ′ X ) − 1 X ′ X ( X ′ X ) − 1 = σ 2 ( X ′ X ) − 1 It follows that
β ^ ∼ N ( β , σ 2 ( X ′ X ) − 1 ) \hat{\beta} \sim N(\beta,\sigma^{2}(X'X)^{-1}) β ^ ∼ N ( β , σ 2 ( X ′ X ) − 1 )
The earlier results regarding the multivariate Gaussian distribution also show that the vector of parameter estimates will be Gaussian even if the original Y Y Y -variables are not independent.